12. Source: Downloading Files from the Internet
Source: Downloading Files from the Internet
Introduction
Source: Downloading Files from the Internet
HTTP (Hypertext Transfer Protocol)
HTTP, the Hypertext Transfer Protocol, is the language that web browsers (like Chrome or Safari) and web servers (basically computers where the contents of a website are stored) speak to each other. Every time you open a web page, or download a file, or watch a video, it's HTTP that makes it possible.
HTTP is a request/response protocol:
- Your computer, a.k.a. the client, sends a request to a server for some file. For this lesson: "Get me the file 1-the-wizard-of-oz-1939-film.txt " , for example. GET is the name of the HTTP request method (of which there are multiple) used for retrieving data.
- The web server sends back a response. If the request is valid: "Here is the file you asked for:" , then followed by the contents of the 1-the-wizard-of-oz-1939-film.txt file itself.
](img/client-server.png)
Source: MDN web docs
If you'd like to learn more, or are feeling like there are knowledge gaps you'd like to fill in, I encourage you to check out the following videos in our free Web Development course in Lesson 1 ("How the Web Works").
Requests: HTTP for Humans
Source: Downloading Files From The Internet
Quiz
In the Jupyter Notebook below, programmatically download all of the Roger Ebert review text files to a folder called ebert_reviews using the Requests library. Use a for loop in conjunction with the provided ebert_review_urls list.
Here is the Requests documentation for easy reference. It is excellently clear relative to similar libraries, like urllib .
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.
Workspace Information:
- Default file path:
- Workspace type: jupyter
- Opened files (when workspace is loaded): n/a
Solution
Source Downloading Files From The Internet II
More Information
- A text file is downloaded in this example. Binary files (images, for example) are best read and wrote to other ways .
- Stack Overflow: What is the 'wb' mean in this code, using Python?